TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domains
نویسندگان
چکیده
Large amount of parallel corpora is required for building Statistical Machine Translation (SMT) systems. We describe the TransDoop system for gathering translations to create parallel corpora from online crowd workforce who have familiarity with multiple languages but are not expert translators. Our system uses a Map-Reduce-like approach to translation crowdsourcing where sentence translation is decomposed into the following smaller tasks: (a) translation of constituent phrases of the sentence; (b) validation of quality of the phrase translations; and (c) composition of complete sentence translations from phrase translations. TransDoop incorporates quality control mechanisms and easy-to-use worker user interfaces designed to address issues with translation crowdsourcing. We have evaluated the crowd’s output using the METEOR metric. For a complex domain like judicial proceedings, the higher scores obtained by the map-reduce based approach compared to complete sentence translation establishes the efficacy of our work.
منابع مشابه
TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain
Large amount of parallel corpora is required for building Statistical Machine Translation (SMT) systems. We describe the TransDoop system for gathering translations to create parallel corpora from online crowd workforce who have familiarity with multiple languages but are not expert translators. Our system uses a Map-Reduce-like approach to translation crowdsourcing where sentence translation i...
متن کاملIRT-based Aggregation Model of Crowdsourced Pairwise Comparison for Evaluating Machine Translations
Recent work on machine translation has used crowdsourcing to reduce costs of manual evaluations. However, crowdsourced judgments are often biased and inaccurate. In this paper, we present a statistical model that aggregates many manual pairwise comparisons to robustly measure a machine translation system’s performance. Our method applies graded response model from item response theory (IRT), wh...
متن کاملComposition operators between growth spaces on circular and strictly convex domains in complex Banach spaces
Let $\Omega_X$ be a bounded, circular and strictly convex domain in a complex Banach space $X$, and $\mathcal{H}(\Omega_X)$ be the space of all holomorphic functions from $\Omega_X$ to $\mathbb{C}$. The growth space $\mathcal{A}^\nu(\Omega_X)$ consists of all $f\in\mathcal{H}(\Omega_X)$ such that $$|f(x)|\leqslant C \nu(r_{\Omega_X}(x)),\quad x\in \Omega_X,$$ for some constant $C>0$...
متن کاملSelected Crowdsourced Translation Practices
This paper contains research related to workflow and design patterns. It briefly discusses the suitability of industry tools for crowdsourcing processes in terms of workflow pattern support. After listing a number of practices identified by analysing crowdsourced translation workflow models, the paper discusses four of the practices and presents two recommendations based on the scenarios of rea...
متن کاملAttacks and Defenses in Crowdsourced Mapping Services
Real-time crowdsourced maps such as Waze provide timely updates on traffic, congestion, accidents and points of interest. In this paper, we demonstrate how lack of strong location authentication allows creation of software-based Sybil devices that expose crowdsourced map systems to a variety of security and privacy attacks. Our experiments show that a single Sybil device with limited resources ...
متن کامل